Use of Bimodal Coherence to Resolve Spectral Indeterminacy in Convolutive BSS
نویسندگان
چکیده
Recent studies show that visual information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterisation of the coherence between the audio and visual speech using, e.g. a Gaussian mixture model (GMM). In this paper, we present two new contributions. An adapted expectation maximization (AEM) algorithm is proposed in the training process to model the audio-visual coherence upon the extracted features. The coherence is exploited to solve the permutation problem in the frequency domain using a new sorting scheme. We test our algorithm on the XM2VTS multimodal database. The experimental results show that our proposed algorithm outperforms traditional audio-only BSS.
منابع مشابه
Use of bimodal coherence to resolve the permutation problem in convolutive BSS
Recent studies show that facial information contained in visual speech can be helpful for the performance enhancement of audio-only blind source separation (BSS) algorithms. Such information is exploited through the statistical characterization of the coherence between the audio and visual speech using, e.g., a Gaussian mixture model (GMM). In this paper, we present three contributions. With th...
متن کاملSparse filter models for solving permutation indeterminacy in convolutive blind source separation
Frequency-domain methods for estimating mixing filters in convolutive blind source separation (BSS) suffer from permutation and scaling indeterminacies in sub-bands. Solving these indeterminacies are critical to such BSS systems. In this paper, we propose to use sparse filter models to tackle the permutation problem. It will be shown that the l1-norm of the filter matrix increases with permutat...
متن کاملA Sparsity-Based Method to Solve Permutation Indeterminacy in Frequency-Domain Convolutive Blind Source Separation
Existing methods for frequency-domain estimation of mixing filters in convolutive blind source separation (BSS) suffer from permutation and scaling indeterminacies in sub-bands. However, if the filters are assumed to be sparse in the time domain, it is shown in this paper that the !1-norm of the filter matrix increases as the sub-band coefficients are permuted. With this motivation, an algorith...
متن کاملAudio-visual Convolutive Blind Source Separation
We present a novel method for speech separation from their audio mixtures using the audio-visual coherence. It consists of two stages: in the off-line training process, we use the Gaussian mixture model to characterise statistically the audiovisual coherence with features obtained from the training set; at the separation stage, likelihood maximization is performed on the independent component a...
متن کاملMinimal Distortion Principle for Blind Source Separation
Blind source separation (BSS) is a method for recovering a set of source signals from the observation of their mixtures without any prior knowledge about the mixing process. In BSS the definition of a source signal has an inherent indeterminacy; any linear transform of a source signal can also be considered a source signal. Due to this indeterminacy, there are an infinite number of valid separa...
متن کامل